Bayesian nonparametric disclosure risk assessment

نویسندگان

چکیده

Any decision about the release of microdata for public use is supported by estimation measures disclosure risk, most popular being number τ1 sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) strength leading estimators with desirable features, including ease implementation, computational efficiency scalability massive data; ii) weakness producing underestimates in realistic scenarios, underestimation getting worse as tail behaviour empirical distribution gets heavier. To fix this phenomenon, we propose Bayesian model can be tuned microdata. Our relies on Pitman–Yor process prior, it leads novel estimator all features that, addition, allows reduce tuning “discount” parameter. We show effectiveness our through its application synthetic data real data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Disclosure Risk Assessment: Predicting Small Frequencies in Contingency Tables

We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focussed on regions of high probabil...

متن کامل

Bayesian Nonparametric Disclosure Risk Estimation via Mixed Effects Log-linear Models

Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk...

متن کامل

Bayesian Nonparametric and Parametric Inference

This paper reviews Bayesian Nonparametric methods and discusses how parametric predictive densities can be constructed using nonparametric ideas.

متن کامل

Disclosure risk estimation via nonparametric log-linear models

A major concern in releasing microdata sets is protecting the privacy of individuals in the sample. Consider a data set in the form of a high-dimensional contingency table. If an individual belongs to a cell with small frequency, an intruder with certain knowledge about the individual may identify him and learn sensitive information about him in the data. To estimate the risk of such breach of ...

متن کامل

Risk Assessment for Toxicity Experiments with Joint Discrete-continuous Outcomes: a Bayesian Nonparametric Approach

We present a Bayesian nonparametric mixture modeling approach to inference and risk assessment for developmental toxicity studies. The primary objective of these studies is to determine the relationship between the level of exposure to a toxic chemical and the probability of a physiological or biochemical response. We consider the general data setting involving clustered categorical responses o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronic Journal of Statistics

سال: 2021

ISSN: ['1935-7524']

DOI: https://doi.org/10.1214/21-ejs1933